What factors or features create a good tasting white wine.
(Intercept) sulphates alcohol volatile.acidity
2.8030 0.4157 0.3250 -1.9629
Residuals:
Min 1Q Median 3Q Max
-3.3186 -0.4854 -0.0406 0.4914 3.1555
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.802988 0.109706 25.550 < 2e-16 ***
sulphates 0.415712 0.096580 4.304 1.71e-05 ***
alcohol 0.325028 0.008972 36.229 < 2e-16 ***
volatile.acidity -1.962858 0.109589 -17.911 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7707 on 4894 degrees of freedom
Multiple R-squared: 0.2431, Adjusted R-squared: 0.2426
F-statistic: 523.9 on 3 and 4894 DF, p-value: < 2.2e-16
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.156000 -0.491700 0.040260 -0.000303 0.485100 3.318000
The deviations from the straight line are minimal. This indicates normal distribution. There is some variation around the tails.
The goal here is to focus on a set of defined features:
## Warning in loop_apply(n, do.ply): position_stack requires constant width:
## output may be incorrect
## Warning in loop_apply(n, do.ply): position_stack requires constant width:
## output may be incorrect
## 'data.frame': 4898 obs. of 15 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## $ quality_predicated : num 5.32 5.51 5.72 5.74 5.74 ...
## $ residuals : num -0.68 -0.495 -0.281 -0.265 -0.265 ...
SO2 plays a very important role in preventing oxidization and maintaining a wine’s freshness
Sweet wines get the biggest doses because sugar combines with and binds a high proportion of any SO2 added. To get the same level of free sulphur dioxide, the total concentration has to be higher than for dry wines. http://www.morethanorganic.com/sulphur-in-the-bottle
Reference: text document from the dataset
Yes, new varibles have been added to the dataset. - predicited values for quality - residuals - Let’s create a ratio with a couple of variables … has to be meaningful
Yes. I would make the quality a factor and remove the factor. A few plots work better when the quality was a factor instead of integer.
The Residual Sugar plot was a surprise. It seems that the sweeter the wine due to more sugar the lower the quality wlll be of the wine.
We had features which had negative slopes and features which had postive slopes when compared with quality.
Yes/No
Alcohol and Quality.
Features that strenghten each other… are we we discussing highly correlated values where a rise in one feature causes a rise in the correlated feature? Or we we trying to convey something else here?
The plots are a mess right now
Yes. I created a linear model.
Results listed above.
We had a low number of low quality and high quality rated wines in this dataset.
For instance, the citric acid and residual sugar levels are more important in white wine.
Moreover, the volatile acidity has a negative impact, since acetic acid is the key ingredient in vinegar. The most intriguing result is the high importance of sulphates, ranked first for both cases. Oenologically this result could be very interesting. An increase in sulphates might be related to the fermenting nutrition, which is very important to improve the wine aroma.
MIT Results sulphates - 23% alcohol - 14% residual sugar - 13% citric acid - 10% total sulfur dioxide 9% free sulfur dioxide 8.5% volatile acidity 8% density 7% pH 6% chlorides 3% fixed acidity 2%
Pearson’s r-correlation If r = +.70 or higher Very strong positive relationship +.40 to +.69 Strong positive relationship +.30 to +.39 Moderate positive relationship +.20 to +.29 weak positive relationship +.01 to +.19 No or negligible relationship -.01 to -.19 No or negligible relationship -.20 to -.29 weak negative relationship -.30 to -.39 Moderate negative relationship -.40 to -.69 Strong negative relationship -.70 or higher Very strong negative relationship
Reference: http://faculty.quinnipiac.edu/libarts/polsci/Statistics.html